The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/k-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.
translated by 谷歌翻译
Open world object detection aims at detecting objects that are absent in the object classes of the training data as unknown objects without explicit supervision. Furthermore, the exact classes of the unknown objects must be identified without catastrophic forgetting of the previous known classes when the corresponding annotations of unknown objects are given incrementally. In this paper, we propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR. In the first stage, we pre-train a model on the current annotated data to detect objects from the current known classes, and concurrently train an additional binary classifier to classify predictions into foreground or background classes. This helps the model to build an unbiased feature representations that can facilitate the detection of unknown classes in subsequent process. In the second stage, we fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint. Furthermore, we alleviate catastrophic forgetting when the annotations of the unknown classes becomes available incrementally by using knowledge distillation and exemplar replay. Experimental results on PASCAL VOC and MS-COCO show that our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
translated by 谷歌翻译
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive baseline results on EuroParl-ST, VoxPopuli and FLEURS test sets. Enabled by the multilinguality of SpeechMatrix, we also explore multilingual speech-to-speech translation, a topic which was addressed by few other works. We also demonstrate that model pre-training and sparse scaling using Mixture-of-Experts bring large gains to translation performance. The mined data and models are freely available.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
自动化车辆功能最佳接受和舒适性的关键因素是驾驶方式。自动化和驱动程序偏爱的驾驶方式之间的不匹配可以使用户更频繁地接管甚至禁用自动化功能。这项工作建议用多模式信号识别用户驾驶样式偏好,因此该车辆可以以连续自动的方式匹配用户偏好。我们对36名参与者进行了驾驶模拟器研究,并收集了广泛的多模式数据,包括行为,生理和情境数据。这包括眼目光,转向抓地力,驾驶演习,制动和节气门踏板输入以及距踏板的脚距离,瞳孔直径,电流皮肤反应,心率和情境驱动驱动环境。然后,我们建立了机器学习模型来识别首选的驾驶方式,并确认所有模式对于识别用户偏好都很重要。这项工作为自动车辆的隐性自适应驾驶风格铺平了道路。
translated by 谷歌翻译
在带有频划分双链体(FDD)的常规多用户多用户多输入多输出(MU-MIMO)系统中,尽管高度耦合,但已单独设计了通道采集和预编码器优化过程。本文研究了下行链路MU-MIMO系统的端到端设计,其中包括试点序列,有限的反馈和预编码。为了解决这个问题,我们提出了一个新颖的深度学习(DL)框架,该框架共同优化了用户的反馈信息生成和基础站(BS)的预编码器设计。 MU-MIMO系统中的每个过程都被智能设计的多个深神经网络(DNN)单元所取代。在BS上,神经网络生成试验序列,并帮助用户获得准确的频道状态信息。在每个用户中,频道反馈操作是由单个用户DNN以分布方式进行的。然后,另一个BS DNN从用户那里收集反馈信息,并确定MIMO预编码矩阵。提出了联合培训算法以端到端的方式优化所有DNN单元。此外,还提出了一种可以避免针对可扩展设计的不同网络大小进行重新训练的培训策略。数值结果证明了与经典优化技术和其他常规DNN方案相比,提出的DL框架的有效性。
translated by 谷歌翻译
网络安全研究中的关键主题之一是自动COA(行动)攻击搜索方法。被动搜索攻击的传统COA攻击方法可能很困难,尤其是随着网络变大。为了解决这些问题,正在开发新的自动COA技术,其中,本文设计了一种智能的空间算法,以在可扩展网络中有效运行。除空间搜索外,还考虑了基于蒙特卡洛(MC)的时间方法来照顾时间变化的网络行为。因此,我们为可扩展和时变网络的时空攻击COA搜索算法提出了一个时空攻击。
translated by 谷歌翻译
最近的深度学习模型在言语增强方面已经达到了高性能。但是,获得快速和低复杂模型而没有明显的性能降解仍然是一项挑战。以前的知识蒸馏研究对言语增强无法解决这个问题,因为它们的输出蒸馏方法在某些方面不符合语音增强任务。在这项研究中,我们提出了基于特征的蒸馏多视图注意转移(MV-AT),以在时域中获得有效的语音增强模型。基于多视图功能提取模型,MV-AT将教师网络的多视图知识传输到学生网络,而无需其他参数。实验结果表明,所提出的方法始终提高瓦伦蒂尼和深噪声抑制(DNS)数据集的各种规模的学生模型的性能。与基线模型相比,使用我们提出的方法(一种用于有效部署的轻巧模型)分别使用了15.4倍和4.71倍(FLOPS),与具有相似性能的基线模型相比,Many-S-8.1GF分别达到了15.4倍和4.71倍。
translated by 谷歌翻译
在本文中,我们通过利用给定数据集中的规律性来有效地介绍了一种新颖的方法来系统地解决数据集凝结问题。我们没有直接在原始输入空间中凝结数据集,而是假设数据集的生成过程,其中一组可学习的代码在紧凑的潜在空间中定义,然后是一组微型解码器,它们将它们映射到原始输入空间。通过互换组合不同的代码和解码器,我们可以大大增加具有相同参数计数的合成示例的数量,因为潜在空间要较低,并且由于我们可以假设尽可能多的解码器来捕获数据集中表示的不同样式费用微不足道。这种知识分解允许以系统的方式有效地共享综合示例之间的信息,从而在压缩比和生成的示例的质量之间进行了更高的权衡。我们通过实验表明,我们的方法通过各种基准数据集(例如SVHN,CIFAR10,CIFAR100和Tinyimagenet)在各种基准数据集上实现了新的最新记录。
translated by 谷歌翻译